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Abstract —Locally caching contents at the network edge con¬ 
stitutes one of the most disruptive approaches in 5G wireless 
networks. Reaping the benefits of edge caching hinges on solv¬ 
ing a myriad of challenges such as how, what and when to 
strategically cache contents subject to storage constraints, traffic 
load, unknown spatio-temporal traffic demands and data sparsity. 
Motivated by this, we propose a novel transfer /earning-based 
caching procednre carried out at each small cell base station. This 
is done by exploiting the rich contextual information (i.e., users’ 
content viewing history, social ties, etc.) extracted from device- 
to-device (D2D) interactions, referred to as source domain. This 
prior information is incorporated in the so-called target domain 
where the goal is to optimally cache strategic contents at the 
small cells as a function of storage, estimated content popularity, 
traffic load and backhaul capacity. It is shown that the proposed 
approach overcomes the notorious data sparsity and cold-start 
problems, yielding significant gains in terms of users’ quaiity-of- 
experience (QoE) and backhaul offloading, with gains reaching 
up to 22% in a setting consisting of four small cell base stations. 

Index Terms —caching, transfer learning, collaborative Alter¬ 
ing, data sparsity, cold-start problem, 5G 

I. Introduction 

Caching at the network edge is one of the five most promis¬ 
ing innovations in 5G wireless networks Q- Recently, it was 
shown that caching can significantly offload different segments 
of the infrastructure including radio access network (RAN) 
and core network (CN), by intelligently storing contents closer 
to the users. As opposed to pushing contents on a best- 
effort basis ignoring end-users’ behavior and interactions, we 
are witnessing an era of truly context-aware and proactive 
networking Q. Undoubtedly, edge caching has taken recent 
5G research activities by storm as evidenced by the recent 
literature in both academia and industry Q-IID (to cite a 
few). 

Although caching has been well-studied in wired networks, 
caching over wireless remains in its infancy. The idea of 
femtocaching was proposed in ||^, in which small base sta¬ 
tions (SBSs) called helpers with low-speed backhaul but high 
storage units carry out content delivery via short-range trans¬ 
missions. Randomly distributed SBSs with storage capabilities 
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are studied in 0. characterizing the outage probability and 
average delivery rate. A stochastic-geometry based caching 
framework for device-to-device (D2D) communications is 
examined in 0 where mathematical expressions of local and 
global fractions of served content requests are given. From 
a game theoretic standpoint, various approaches have been 
studied such as multi-armed bandits under unknown content 
popularity Q, many-to-many matching |[^ and joint content- 
aware user clustering and content caching m- Other works 
include information-theoretic studies looking at fundamentals 
of local and global caching gains in Q, facility location based 
approximation in ||7), as well as multiple-input multiple-output 
(MIMO) caching in p^ , and coded caching in p0| . 

In Q, by exploiting spatio-social caching coupled with D2D 
communication, we proposed a novel proactive networking 
paradigm in which SBSs and user terminals (UTs) proactively 
cache contents at the network edge. As a result, the overall 
performance of the network in terms of users’ satisfaction 
and backhaul offloading was improved. Therein, the proactive 
caching problem assumed non-perfect knowledge of the con¬ 
tent popularity matrix, and supervised machine learning and 
collaborative filtering (CF) techniques were used to estimate 
the popularity matrix leveraging user-content correlations. 
Nevertheless, the content popularity matrix remains typically 
large and sparse with very few users ratings, rendering CF 
learning methods inefficient mainly due to data sparseness 
and cold-start problems GD- 

Given the fact that data sparsity and cold-start problems 
degrade the performance of proactive caching, we leverage 
the framework of transfer learning (TL) and recent advances 
in machine learning 0. TL is motivated by the fact that in 
many real-world applications, it is hard or even impossible 
to collect and label training data to build suitable prediction 
models. Exploiting available data from other rich information 
sources such as D2D interactions (called as source domain), 
allows TL to substantially improve the prediction task in the 
so-called target domain. TL has been applied to various data 
mining problems such as classification and regression p4| . TL 
methods can be mainly grouped into inductive, transductive 
and unsupervised TL methods depending on the availability of 
labels in the source and target domains. All these approaches 
boil down to answering the following fundamental questions; 
1) what information to transfer? 2) how to transfer it? and 3) 


when to transfer it? While "what to transfer" deals with which 
part of the knowledge should be transferred between domains 
and tasks, "when to transfer" focuses on the timing of the 
operations in order to avoid negative transfer, especially when 
the source and target domains are uncorrelated. On the other 
hand, "how to transfer" deals with what kind of information 
should be transferred between domains and tasks. 

The main contribution of this work is to propose a TL- 
based content caching mechanism to maximize the backhaul 
offloading gains as a function of storage constraints and 
users’ content popularity matrix. This is done by learning and 
transferring hidden latent features extracted from the source 
domain to the target domain. In the source domain, we take 
into account users’ D2D interactions while accessing/sharing 
statistics of contents within their social community as prior 
information in the knowledge transfer. It is shown that the 
content popularity matrix estimation in the target domain can 
be significantly improved instead of learning from scratch with 
unknown users’ ratings. To the best of our knowledge, this is 
perhaps the first contribution of unsupervised transfer learning 
in cache-enabled small cells. 

The rest of the paper is organized as follows. The network 
model under consideration is provided in Section |I^ accom¬ 
panied with the caching problem formulation in both source 
and target domains. Section III presents the classical CF- 
based caching and that of the proposed transfer learning. The 
numerical results capturing the impact of various parameters 
on the users’ satisfaction and backhaul offloading gains are 
given in Section |IV] We finally conclude and delineate future 
directions in Section [V] 


II. Network Model 

Let us assume an information system denoted by in 
the source domain and an information system denoted by 
in the target domain. A sketch of the network model is shown 
in Fig. [T] 

A. Target Domain 

Let us consider a network deployment consisting of Mtar 
SBSs from the set M.tar = {1, • ■ • > ALiar} and Ntar UTs from 
the set Ntar — {Ij • ■ • ,Ntar}- Each SBS m is connected to 
the core network via a limited backhaul link with capacity 0 < 
Cm < oo and each SBS has a total wireless link capacity C'm 
for serving its UTs in the downlink. We further assume that 
UTs request contents from a library iFtar = 
Ftar}, where each content / has a size of L{f) and a 
bitrate requirement of B{f). Moreover, we suppose that users’ 
content requests follow a Zipf-like distribution {f)Nf G 
J-tar defined as m): 

^ ( 1 ) 

where 11 = ^ ^f=i ^^d a characterizes the steepness 

of the distribution, reflecting different content popularities. 
Having such a content popularity in the ordered case, the 
content popularity matrix for the m-th SBS at time t is given 
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Figure 1: An illustration of the network model which consists 
of two information systems and Due to the lack 

of prior information in the target domain, the information 
extracted from users’ social interactions and their ratings in 
the source domain is transferred to the target domain. 


by P’"(f) e KMarxFta,. y^jjere each entry P^f{t) represent 
the probability that the n-th user requests the /-th content. 

In order to avoid any kind of bottleneck during the delivery 
of users’ content requests, we assume that each SBS has a 
finite storage capacity of Sm and caches selected contents 
from the library Ptar- Thus, the amount of requests SBSs 
satisfy from their local caches is of high importance to avoid 
peak demands and minimize the latency of content delivery. 
Our goal is to offload the backhaul while satisfying users’ 
content requests, by pre-fetching strategic contents from the 
CN at suitable times and cache them at the SBSs, subject to 
their storage constraints. To formalize this, suppose that D 
number of requests from the set D — {1,..., H} are made by 
users during T time-slots. Then, a request d GD within time 
window T is served immediately and is said to be satisfied, if 
the rate of delivery is equal or greater than the content bitrate, 
such that: 


Hfd) 

P(fd) - T{fd) 


> B{fd) 


( 2 ) 


where fd is the requested content, L{fd) and B{fd) are the 
size and bitrate of the content, T{fd) is the arrival time of the 
request and T'{fd) the end time delivery. Given these defini¬ 
tions, the users’ average satisfaction ratio can be expressed 
as: 




d^V 


Hfd) 


T'ifd) - T{fd) 


> B{fd) 


( 3 ) 











where 1 is the indicator function which returns 1 if the 
statement holds and 0 otherwise. Suppose that the instanta¬ 
neous backhaul rate for the content delivery of request d at 
time t is given by Rd{t) < Cm, Vm G A4tar- Then, the 
average backhaul load is defined as; 
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Now, denote X(f) G |o^ as the cache decision 

matrix of SBSs, where Xm.f(t) equals 1 if the /-th content is 
cached at the m-th SBS at time t, and 0 otherwise. Therefore, 


the backhaul 

offloading problem 

can be formally expressed as: 

minimize 
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where R'd{t) is the instantaneous wireless link rate for request 
d and rj^m is the minimum target satisfaction ratio respectively. 
In order to solve this problem, a joint optimization of the cache 
decision X(f) and the content popularity matrix estimation 
P’"(f) is needed. Moreover, solving (|^ is very challenging 
due to: 

i) limited backhaul and wireless link capacity as well as the 
limited storage capacity of SBSs, 

ii) large number of users with unknown ratings and library 
size, 

iii) SBSs need to track, learn and estimate users’ content 
popularity/rating matrix ?"*(/) for cache decision while 
dealing with data sparsity. 

For simplicity, we drop now the index of the SBSs and 
assume that the content popularity is stationary during T time 
slots, thus ?"*(/) is denoted as Ptar- Moreover, for sake of 
exposition, we restrict ourselves to caching policies in which 
the contents are stored during the peak-off hours, thus X(f) 
remains fixed during the content delivery and represented as 
X. In the following, we examine the source domain which we 
exploit when dealing with the sparsity of Ptar in the target 
domain. 


in the sequel. Specifically, this source domain contains the be¬ 
haviour of users’ interactions within their social communities, 
modelled as a Chinese restaurant process (CRP) m- This 
constitutes the prior information used in the transfer learning 
procedure. 

In the CRP with parameter (3, every customer selects an 
occupied table with a probability proportional to the number 
of occupants, and selects the next vacant table with probability 
proportional to /3. More precisely, the first customer selects 
the first table with probability ^ = 1. The second customer 
selects the first table with probability and the second 

table with probability . After the second customer selects 
the second table, the third customer chooses the first table with 
probability 2 ^, the second table with probability and 
the third table with probability . This stochastic Dirichlet 
process continues until all customers select their seats, defining 
a distribution over allocation of customers to tables. 

In this regard, the content dissemination in the social 
network is analogous to the table selection in a CRP. If 
we view this network as a CRP, the contents as the large 
number of tables, and users as the customers, we can make 
an analogy between the content dissemination and the CRP. 
First, suppose that there exist Nb 2 D users in this network. 
Let Fd 2 D = Fq + Fh be the total number of contents in 
which Fh represents the number of contents with viewing 
histories and Fq is the number of contents without history. 
Denote also 7id2D C {Q, xF£, 2 r> ^ random binary 

matrix indicating which contents are selected by each user, 
where Znj = 1 if the n-th user selects the /-th content and 0 
otherwise. Then, it can be shown that m- 


P{'^D2d) 


/3^'’r(/3) 

r(/3 + Nd2d) 


II (to/ - 1 )! 


/=! 


(6) 


where r(.) is the Gamma function, to/ is the number of users 
assigned to content / (i.e., viewing history) and F^ is the 
number of contents with viewing histories with to/ > 0. 

In the target domain, the caching problem boils down to 
estimating the content popularity matrix which is assumed to 
be largely unknown, yielding degraded performance (i.e., very 
low cache hit ratios, slow convergence, etc.). Moreover, this 
degradation can be more severe in cases where the number of 
users and library size is extremely large. Therefore, in order 
to handle these issues and cache contents more efficiently, we 
propose a novel proactive caching procedure using transfer 
learning which exploits the rich contextual information ex¬ 
tracted from users’ social interactions. This caching procedure 
is shown to yield more backhaul offloading gains compared 
to a number of baselines, including random caching and the 
classical CF-based estimation methods m- 


B. Source Domain 

As advocated in 0^ we leverage the existence of a D2D- 
based social network overlay made of users’ interactions 
within their social communities, referred as the source domain 


III. Transfer Learning; Boosting Content 
Popularity Matrix Estimation 

First, we start by explaining the classical CF-based learning, 
then detail our proposed TL solution. 




A. Classical CF-based Learning 


The classical CF-based estimation procedure is composed of 
a training and prediction phase. In the training part, the goal is 
to estimate the content popularity matrix Ptar & , 

where each SBS constructs a model based on the already 
available information (i.e., users’ content ratings). Let Aftar 
and J^tar represent the set of users and contents associated with 
Ntar users and Ftar contents. In particular, Ptar with entries 
Ptar,ij is the (sparse) content popularity matrix in the target 
domain. TZtar = {{i,j,r) : r = Ptar,ij, Ptar,ij ^ 0 } denotes 
the set of known user ratings. In the prediction phase, in order 
to predict the unobserved ratings in Ntan low-rank matrix 
factorization techniques are used to estimate the unknown 
entries of Ptar- The objective here is to construct a fc-rank 
approximate popularity matrix Ptar ~ ^far^tar, where the 
factor matrices Ntar C ^tar G are 

learned by minimizing the following cost function: 


minimize 

(i,j)6Ptar 


Ptar,ij'^ + 

(ij)ePtar 

/^^||Ntar||F + ||Ftar||f^ 


(7) 


where the sum is over the (i,j) user/content pairs in the training 
set. In addition, rit and ij represent the z-th and j-th columns 
of Ntar and Ftar respectively, and 11.1denotes the Frobenius 
norm. In 0 , the parameter /r provides a balance between 
regularization and htting training data. Unfortunately, users 
may rate very few contents, causing Ptar to be extremely 
sparse, and thus Q suffers from severe over-htting issues and 
engenders poor performance. 


B. TL-based Content Caching 

To alleviate data sparsity, solving 0 can be done more 
efficiently by exploiting and transferring the vast amount of 
available user-content ratings (i.e., prior information) from 
a different-yet-related source domain. Formally speaking, let 
us denote the source domain as and assume that this 

domain is associated with a set of No 2 D users and Fd 2 D 
contents denoted by Nd 2 D and Fd 2 D respectively. Addition¬ 
ally, the user-content popularity matrix in the source domain 
is given by matrix Pd 2 D G '^No 2 d^Fd 2 d likewise let 
1^020 = ■ r = PD 2 D,ij, PD 2 D,ij ^ 0 } represent 

the set of observed user ratings in the source domain. The 
underlying principle of the proposed approach is to smartly 
"borrow" carefully-chosen user social behavior information 
from to better learn . 

The transfer learning procedure from to is com¬ 
posed of two interrelated phases. In the first phase, a content 
correspondence is established in order to identify similarly- 
rated contents in both source and target domains. In the second 
phase, an optimization problem is formulated by combining 
the source and target domains for knowledge transfer, to 
jointly learn the popularity matrix Ptar in the target domain. 
In this regard, we suppose that both source and target domains 
correspond to one information system s S 
that is made of Ng users and Fg contents given by Afg 


and Fg respectively. In each system s, we observe Pg with 
entries Pg^ij. Let Tig = {{i,j,r) : r = Pg,ij,Ps,ij 7 ^ 0} 
represent the set of observed user ratings in each system 
and the set of shared contents is given by F. Moreover, let 
Af* = AfD 2 D u Aftar and F* = Fd 2 D U Ftar be the union 
of the collections of users and contents, respectively, where 
N* = lA/"*! and F* — IJ"*! represent the total number of 
unique users and contents in the union of both systems. 

In the proposed TL approach, we model the users N* and 
contents F* by a user factor matrix N G and a content 

factor matrix F € , where the z-th and j-th columns of 

these matrices are given by and fj, respectively. The aim is 
to approximate the popularity matrix Pg « NjFg by jointly 
learning the factor matrices N and F. This is formally done 
by minimizing the following cost function: 

minimize (ag ( 8 ) 

m(||n||^ + ||F|||.^ 

where the parameter ag is the weight of each system. By doing 
so, Pd 2 D and Ptar are jointly factorized, and thus the set of 
factor matrices F £>20 and Ftar become interdependent as the 
features of a shared content are similar for knowledge sharing. 
A practical TL-based caching procedure is sketched in Fig. 



Figure 2: An illustration of the proposed TL-based caching 
procedure. 

IV. Numerical Results and Discussion 
The objective of this section is to validate the effectiveness 
of the proposed TL caching procedure and draw key insights. 
In particular, we consider the following caching policies for 
comparison: 

1) Ground Truth: Given the perfect rating matrix Ptar, the 
most popular contents are stored greedily. 

2) Random caching Q: Contents are cached uniformly at 
random. 
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Figure 3: Evolution of the aggregate backhaul load and users’ satisfaction ratio. 


3) Collaborative Filtering |T3): The content popularity ma¬ 
trix Ptar is estimated via CF from a training set with 
4% of ratings. Then, the most popular contents are stored 
accordingly. 

4) Transfer Learning: Ptar and P d 2 D matrices are jointly 
factorized via TL by using a training set with 12% of 
ratings and perfect user-content correspondence. Then the 
most popular contents are stored accordingly. 

In the numerical setup, having contents cached according to 
these policies, the SBSs serve their users according to a traffic 
arrival process. This process is drawn from a Poisson process 
with intensity A. The storage size of SBSs, content lengths, 
capacities of non-interfering wireless and backhaul links are 
assumed to have same constant values individually, in order 
to showcase the performance of the caching policies. The 
numerical results of users’ satisfaction ratio and backhaul load 
are obtained by averaging out 1000 Monte-Carlo realizations. 
The simulation parameters are summarized in Table |Ij unless 
stated otherwise. 

The dynamics of users’ satisfaction ratio and backhaul 
load with respect to the storage size, demand shape in the 
source domain, traffic intensity and backhaul capacity are 
given in Fig. The results are normalized to show the various 
percentage gains, whereas the actual values are shown in Table 
|I] In the following, we discuss in detail the impact of these 
parameters. 

1) Impact of the storage size (Sm)-' The storage size is 
indeed one of the crucial parameter in cache-enabled SBSs, 
and it is expected that higher storage sizes result in better 
performance in terms of satisfaction ratio and backhaul of¬ 
floading. According to this setup, we would like to note that 
the biggest improvement in satisfaction ratio and decrement 
in the backhaul load is achieved by the ground truth baseline 


Table I: Simulation Parameters 


Parameter 

Description 

Default-Varied Values 

Mto.r 

Number of SBSs 

4 

Ntar 

Number of UTs 

32 

Fta.r 

Library size 

32 contents 

L 

Content length 

1 MBit 

B 

Bitrate requirement 

1 MBit 

T.cf 

Total wireless capacity 

32 MBit/s 

T 

Time slots 

128 seconds 

oc 

Zipf parameter 

2 

0 

CRP concentration parameter 

2 - [2 ~ too] 

T.s^ 

Total storage size 

6 - [0 ~ 32] MBit 

T.Cm 

Total backhaul capacity 

1 - [1 ~ 8] MBit/s 

X 

Traffic intensity 

1 - [1 ~ 3] demand/s 


where the content popularity is perfectly known. The random 
approach on the other hand has the worst-case performance. 
The CF approach exhibits similar performance as the random 
approach due to the cold-start problem, whereas the satisfac¬ 
tion ratio and backhaul offloading gains of TL are close to the 
ground truth baseline. In particular, it is shown that the TL 
policy outperforms its CF counterpart, with satisfaction and 
backhaul offloading gains up to 22% and 5% respectively. 

2) Impact of the demand shape in the source domain (/3): 
The demand shape in the source domain, characterized by the 
CRP concentration parameter /3 provides meaningful insights 
to our problem. In fact, as (3 increases, the demand shape tends 
to be more uniform, requiring higher storage sizes at the SBSs 
to sustain the same performance. In a storage limited case, 
we see that the satisfaction ratio decreases and the backhaul 
load increases with the increment of /?. Compared to the CF 



























































































approach, the gains of TL are around 6% for the satisfaction 
gains and 22% for the backhaul offloading. However, the gap 
between TL and CF becomes smaller as (3 increases. 

3) Impact of the traffic intensity (X): As the average number 
of request arrivals per time slot increases, bottlenecks in the 
network are expected to occur due to the limited resources 
of SBSs, resulting in less satisfaction ratios. This is visible in 
the high arrival rate regime, whereas the relative backhaul load 
remains constant. It can be shown that the ground truth caching 
with perfect knowledge of content popularity outperforms 
the other policies while the random approach has the worst 
performance. On the other hand, the performance of TL is in 
between these approaches and has up to 3% satisfaction gains 
and 18% of backhaul offloading gain compared to the CF. 

4) Impact of the backhaul capacity (Cm)- The total back¬ 
haul capacity is assumed to be sufficiently smaller than the ca¬ 
pacity of wireless links. The increment of this capacity clearly 
results in higher satisfaction ratios in all cases. Note that any 
content not available in the caches of SBSs is delivered via the 
backhaul. Therefore, increasing the backhaul capacity avoids 
the bottlenecks during the delivery, thus yielding higher users’ 
satisfaction. On the other hand, the backhaul load remains 
constant in this setting. It can be seen that TL approach has 
satisfaction ratio gains of up to 6% and backhaul offloading 
of up to 5% compared to the CF approach. 
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Figure 4: Evolution of the backhaul load with respect to the 
perfect correspondence ratio. 

5) Impact of source-target correspondence: We have so 
far assumed that the user/content correspondence between 
the target and source domains is perfect. This is a strong 
assumption and such an operation requires a more careful 
treatment to avoid negative transfer. Here, we relax this 
assumption by introducing a perfect correspondence ratio. This 
ratio represents the amount of perfect user/content matching 
between both source and target domains. A ratio of 0 means 
that 100% of correspondence is done uniformly at random 
and 1 is equivalent to the perfect case. It is shown in Fig. 
1^ that TL has a poor performance in the low values of this 



Perfect correspondence ratio 


ratio, with similar performance as the random caching due 
to the negative transfer. However, as this ratio increases, the 
performance of TL improves, outperforming the CF with a 
ratio of 0.58. This underscores the importance of such an 
operation for the positive transfer and is left for future work. 

V. Conclusions 

We proposed a novel transfer learning-based caching pro¬ 
cedure which was shown to yield higher users’ satisfaction 
and backhaul offloading gains overcoming the data sparsity 
and cold start problems. Numerical results confirmed that 
the overall performance can be improved by transferring a 
judiciously-extracted knowledge from a source domain to a 
target domain via TL. An interesting future work is assess¬ 
ing the performance of TL-based caching using real traces. 
Another avenue of research is extending the current model to 
predictive scheduling and predictive offloading. 
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